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BACKGROUND OF THE INVENTION 



1. Field of the Invention 

5 This invention relates to computer networks, and more particularly to a system 

and method for providing a distributed information discovery platform that enables 
discovery of information from distributed information providers. 

2. Description of the Related Art 

10 

It has been estimated that the amount of content contained in distributed 
information sources on the public web is over 550 billion documents. In comparison, 
leading Internet search engines may be capable of searching only about 600 million pages 
out of an estimated 1.2 billion "static pages." Due to the dynamic nature of Internet 
15 content, much of the content is unsearchable by conventional search means. In addition, 
the amount of content unsearchable by conventional means is growing rapidly with the 
increasing use of application servers and web enabled business systems. 

Crawlers currently may take three months or more to crawl and index the web 
20 (Google numbers), so that conventional, crawler-based search engines such as Google 
may best perform when indexing static, slowly changing web pages such as home pages 
or corporate information pages. Targeted or restricted crawling of headline or other 
metadata is possible (such as that done by moreover.com) but this limits search ability. 
Web resources that do not have a "page of contents" or similar index — "deep" web 
25 resources — may be more difficult to search, index, or reference by conventional crawler- 
based search engines. For example, Amazon.com contains millions of product 
descriptions in its databases but does not have a set of pages listing all these descriptions. 
As a result, in order to crawl such a resource, it may be necessary — though difficult — to 
query the database repeatedly with every conceivable query term until all products are 
30 extracted. Likewise, many web pages are generated dynamically given information about • 
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the consumer or context of the query (time, purchasing behavior, location, etc.), a crawler 
approach is likely to lead to distortion of such data. In some situations, content may be 
inaccessible due to access privileges (e.g. a subscription site), or for security reasons (e.g. 
a secure content site). 

5 

Conventional search mechanisms also may be less efficient than desirable in 
regard to some types of information providers, for example in regards to accessing 
dynamic content from a news site. A current news provider may provide content created 
by editors and stored in a database as XML or other presentation neutral form. The news 

10 provider's application server may render the content as a web page with associated links 
using templates. Although the end user may see a well-presented page with the relevant 
information, for a crawler-type search engine to extract the content of the HTML page it 
must be programmed to use information about the structure of the page and "scrape" the 
content and headline from the page. It may then store this content or a processed version 

15 for indexing purposes in its own database, and retrieve the link and story when a query 
matching the story is submitted. This search process is inherently inefficient and prone to 
errors. In addition it gives the content provider no control over the format of the article or 
the decision about which article to show in response to a query. 

20 It would be desirable for search mechanism of the web to perform "deep searches" 

and "wide searches.'* "Deep search" may find information embedded in large databases 
such as product databases (e.g. Amazon.com) or news article databases (e.g. CNN). 
"Wide searches" may reach a large distribution. Moreover, it would be desirable for the 
search mechanism to efficiently use bandwidth and maximize search speed while 

25 avoiding bottlenecks. It would also be desirable for a search mechanism to function over 
an expanded web covering a wide array of distributed devices (e.g. PCs, handheld 
devices, PDAs, cell phones, etc.). 
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SUMMARY OF THE INVENTION 



A distributed network search mechanism is described for a consumer coupled to a 
network to send a search request to and receive a search result from at least one provider 
5 coupled to the network in response to its search request A search request may include a 
search query. A search result may include a query result A search request and a search 
result may be formatted according to a query routing protocol (QRP). A QRP may 
specify a mark-up language format for communicating search requests, search results, 
and/or other information between nodes in the network. 

10 

A network hub may be configured to implement a search method according to a 
query routing protocol. The search method may include receiving a search request from a 
consumer. A network hub may accept search requests only from registered consumers. 
A network hub may be configured to receive registration requests from consumers. A 

15 network hub may be configured to receive registration requests from providers. A 
registration request may be formatted according to a QRP. A provider's registration 
request may indicate at least some of the search queries the provider is interested in 
receiving. The search method may include resolving a consumer's search query from a 
search request by determining at least one provider that indicated interest in receiving at 

20 least similar search queries in its registration request A network hub may be configured 
to route a consumer's search query to a provider and may format the search query 
according to a QRP. 

A provider may be configured to receive a search query. A provider may respond 
25 with a query result. A provider may be configured to customize its query result. A query 
result may be formatted according a QRP. The query result may be routed to a network 
hub. A network hub may be configured to receive a query result from a provider. A 
network hub may be configured to collate a plurality of query results regarding the same 
search query. A network hub may be configured to route a query result or collated query 
30 results to a consumer as a search result. A search result may be formatted according to a ' 
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